Goto

Collaborating Authors

 random seed


Structured Temporal Causality for Interpretable Multivariate Time Series Anomaly Detection

Neural Information Processing Systems

Real-world multivariate time series anomalies are rare and often unlabeled. Additionally, prevailing methods rely on increasingly complex architectures tuned to benchmarks, detecting only fragments of anomalous segments and overstating performance. In this paper, we introduce OracleAD, a simple and interpretable unsupervised framework for multivariate time series anomaly detection. OracleAD encodes each variable's past sequence into a single causal embedding to jointly predict the present time point and reconstruct the input window, effectively modeling temporal dynamics. These embeddings then undergo self-attention mechanism to project them into a shared latent space and capture spatial relationships.


51790e459ce50a8f7182b46e2fd29a95-Paper-Conference.pdf

Neural Information Processing Systems

How should we evaluate the quality of generative models? Many existing metrics focus on a model's producibility, i.e. the quality and breadth of outputs it can generate. However, the actual value from using a generative model stems not just from what it can produce but whether a user with a specific goal can produce an output that satisfies that goal. We refer to this property as steerability. In this paper, we first introduce a mathematical decomposition for quantifying steerability independently from producibility.


How Data Augmentation Shapes Neural Representations

arXiv.org Machine Learning

Data augmentation is widely recognized for improving generalization in deep networks, yet its impact on the geometry of learned representations remains poorly understood. In this work, we characterize how different data augmentation strategies reshape internal representations in neural networks. Using tools from shape analysis, we embed network hidden representations into a metric space where distance is invariant to scaling, translation, rotation and reflection. We show that increasing augmentation strength leads to well-behaved trajectories in this space, and that different augmentation types steer representations in distinct directions. Moreover, we investigate how neural representation shapes are distorted along data augmentation trajectories, and show that insights from neural geometry can predict which representations provide the most improvement when ensembling models. Our results reveal shared geometric patterns across architectures and seeds, and suggest that analyzing shape-space trajectories offers a principled tool for understanding and comparing data augmentation methods.


TabCF: Distributional Control Function Estimation with Tabular Foundation Models

arXiv.org Machine Learning

Instrumental variable (IV) and control function (CF) methods are powerful tools for causal effect estimation in the presence of unmeasured confounding, yet most existing approaches target only mean effects and/or demand substantial fitting and tuning effort. In this paper, we introduce a simple method, TabCF, for control function regression using tabular foundation models, which enables accurate, fast, identification-transparent, and tuning-light causal estimation of distributional quantities, such as interventional means and quantiles; we also propose a copula-based approximation for multivariate outcomes. TabCF performs favorably against representative methods across a broad range of small- to medium-sized synthetic and real data scenarios. The central message is two-fold: for practitioners, it highlights that TabCF is an effective tool for distributional causal inference; for researchers, it suggests that the proposed approach could be considered a strong baseline for future method development. Code is available at https://github.com/GepingChen/TabCF.


Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training

Neural Information Processing Systems

Regularization in modern machine learning is crucial, and it can take various forms in algorithmic design: training set, model family, error function, regularization terms, and optimizations. In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training. Indeed, many widely adopted training strategies basically just define the decay of the learning rate over time. This process can be interpreted as decreasing a temperature, using either a global learning rate (for the entire model) or a learning rate that varies for each parameter. This paper proposes TempBalance, a straightforward yet effective layer-wise learning rate method. TempBalanceis based on Heavy-Tailed Self-Regularization (HT-SR) Theory, an approach which characterizes the implicit self-regularization of different layers in trained models. We demonstrate the efficacy of using HT-SR-motivated metrics to guide the scheduling and balancing of temperature across all network layers during model training, resulting in improved performance during testing.


The proposition makes use of the following observation: For the discriminator defined in (1), the norm of gradient for wt is upper bounded by k wtDθ(x)k F kxk LY

Neural Information Processing Systems

The upper bound of gradient's Frobenius norm for spectrally-normalized discriminators follows directly. As lw(x) is a linear transformation, we have lcw(x) = c lw(x), and lw(cx) = c lw(x). Moreover, since ReLU and leaky ReLU is linear in R+ and R region, we have ai(cx) = c ai(x). In this section we discuss the gradients with respect the actual parameter wi. From Eq. (12) in [30] we know wtDθ(x) = A, we know that w0tDθ(x) F, otl(x)Dθ(x), and kotl (x)k have upper bounds. From Theorem 1.1 in [44] we know that if wt is initialized with i.i.d random variables from uniform or Gaussian distribution, E kwtkspis lower bounded away from zero at initialization. So k wtDθ(x)kF is upper bounded at initialization. Moreover, we observe empirically that kwtksp is usually increasing during training. Therefore, k wtDθ(x)kF is typically upper bounded during training as well. The following proposition states that spectral normalization also gives an upper bound on kHwi(Dθ)(x)ksp for networks with ReLU or leaky ReLU internal activations.